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Abstract 

In  this  paper,  we  compare  two  computationally  efficient  approximation  methods  for  the  estimation 
of  growth  rate  distributions  in  size-structured  population  models.  After  summarizing  the  underlying  the¬ 
oretical  framework,  we  present  several  numerical  examples  as  validation  of  the  theory.  Furthermore, 
we  compare  the  results  from  a  spline  based  approximation  method  and  a  delta  function  based  approx¬ 
imation  method  for  the  inverse  problem  involving  the  estimation  of  the  distributions  of  growth  rates 
in  size-structured  mosquitofish  populations.  Convergence  as  well  as  sensitivity  of  the  estimates  with 
respect  to  noise  in  the  data  are  discussed  for  both  approximation  methods. 
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1  Introduction  and  Theoretical  Background 


In  this  paper  we  consider  approximation  methods  for  a  general  class  of  estimation  or  inverse  problems 
wherein  the  quantity  of  interest  is  a  probability  distribution.  In  particular  we  assume  that  we  have  a 
parameter  (q  £  Q)  dependent  system  with  model  responses  x(q)  describing  in  some  manner  a  population 
of  interest.  For  data  or  observations,  we  are  given  a  set  of  values  {zi}  for  the  expected  values 

£[xi(q)\P\  =  (  Xi(q)dP(q) 

JQ 

with  respect  to  an  unknown  probability  distribution  P  describing  the  distribution  of  the  parameters  q  over 
the  population  under  investigation.  We  wish  to  use  this  data  to  choose  from  a  given  family  'P(Q)  the 
distribution  P*  that  gives  a  best  fit  of  the  underlying  model  to  the  data.  Here  we  formulate  an  ordinary 
least  squares  (OLS)  version  of  the  problem,  but  this  is  not  essential  to  our  results  and  one  could  equally 
well  use  a  weighted  least  squares,  a  maximum  likelihood  estimator,  etc.,  approach.  Thus  we  seek  to 
minimize 

J(P)  =  J2  \£[xM)\p]-zi\2 

i 

over  P  £  V(Q)-  Even  for  simple  dynamics  for  X/,  this  is  in  general  an  infinite  dimensional  optimization 
problem  and  approximations  that  lead  to  computationally  tractable  schemes  are  desirable.  That  is,  it  is 
useful  to  formulate  methods  to  yield  finite  dimensional  sets  VM{Q)  over  which  to  minimize  J(P).  Of 
course,  we  wish  to  choose  these  methods  so  that  “VM  (Q)  — >  'P(Q)”  in  some  sense.  In  this  case  we  shall 
use  the  Prohorov  metric  [3,  15]  of  weak  star  convergence  of  measures  to  assure  the  desired  approximation 
results.  A  general  theoretical  framework  is  given  in  [3]  with  specific  results  on  the  approximations  we 
use  here  given  in  [2,  12].  Briefly,  ideas  for  the  underlying  theory  are  as  follows.  One  argues  continuity 
of  P  —>  J(P)  on  V(Q)  with  the  Prohorov  metric  (this  is  trivial  in  the  cases  considered  here).  If  Q  is 
compact  (again,  easily  established  in  our  case)  then  it  is  known  that  V{Q)  is  a  complete  metric  space  and 
is  also  compact  when  taken  with  the  Prohorov  metric.  Moreover,  if  the  approximation  families  VM{Q) 
are  chosen  so  that  elements  PM  £  VM(Q)  can  be  found  to  approximate  elements  P  £  V(Q)  in  the 
Prohorov  metric,  then  well-posedness  (existence,  continuous  dependence  of  estimates  on  data,  etc.)  can 
be  obtained. 

The  data  {zi}  available  (which,  in  general,  will  involve  longitudinal  or  time  evolution  data)  deter¬ 
mines  the  nature  of  the  problem.  The  most  classical  problem  (which  we  shall  refer  to  as  a  Type  /  problem) 
is  one  in  which  individual  longitudinal  data  is  available  for  members  in  the  population.  In  this  case  there 
is  a  wide  statistical  literature  (in  the  context  of  hierarchical  modeling,  mixing  distributions,  mixed  or  ran¬ 
dom  effects,  mixture  models,  etc.)  [14,  17,  18,  19,  20,  26,  27,  28,  29,  30,  33,  34]  which  provides  theory 
and  methodology  for  estimating  not  only  individual  parameters  but  also  population  level  parameters  and 
allows  one  to  investigate  both  intra-individual  and  inter-individual  variability  in  the  population  and  data. 
In  what  we  shall  refer  to  as  Type  II  problems  one  has  only  aggregate  or  population  level  longitudinal 
data  available.  This  is  common  in  marine,  insect,  etc.,  catch  and  release  experiments  [11]  where  one 
samples  at  different  times  from  the  same  population  but  cannot  be  guaranteed  of  observing  the  same  set 
of  individuals  at  each  sample  time.  This  type  of  data  is  also  typical  in  experiments  where  the  organism 
or  population  member  being  studied  is  sacrificed  in  the  process  of  making  a  single  observation  (e.g., 
certain  physiologically  based  pharmacokinetic  (PBPK)  modeling  [13,  21,  31]  and  whole  organism  trans¬ 
port  models  [11]).  In  this  case  one  may  still  have  dynamic  (i.e.,  time  course)  models  for  individuals,  but 
no  individual  data  is  available.  Finally,  the  third  class  of  problems  which  we  shall  refer  to  as  Type  III 
problems  involves  dynamics  which  depend  explicitly  on  the  probability  distribution  P  itself.  In  this  case 
one  only  has  dynamics  ( aggregate  dynamics )  for  the  expected  value 


x{t)=  [  x{t,  q)dP(q) 

JQ 

of  the  state  variable.  No  dynamics  are  available  for  individual  trajectories  x(t,  q)  for  a  given  q  £  Q. 
Such  problems  arise  in  viscoelasticity  and  electromagnetics  as  well  as  biology  [3,  5,  6,  12,  23].  While  the 
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approximations  we  discuss  in  this  paper  are  applicable  to  all  three  types  of  problems,  we  shall  illustrate 
the  computational  results  in  the  context  of  size-structured  marine  populations  where  the  inverse  problems 
are  of  Type  II. 

Finally,  we  note  that  in  the  problems  considered  here,  one  can  not  sample  directly  from  the  probabil¬ 
ity  distribution  being  estimated  and  this  again  is  somewhat  different  from  the  usual  case  treated  in  some 
of  the  statistical  literature,  e.g.,  see  [35,  36]  and  the  references  cited  therein. 


2  The  Growth  Rate  Distribution  Model  and  Inverse  Problem 


We  now  turn  our  attention  to  a  specific  application  of  the  general  ideas  discussed  above.  The  motivating 
application  for  this  work  is  the  estimation  of  growth  rate  distributions  for  size-structured  mosquitofish 
populations.  Mosquitofish  have  been  used  in  the  place  of  pesticides  as  a  way  to  control  mosquito  pop¬ 
ulations  in  rice  fields.  Biologists  desire  to  correctly  predict  the  growth  and  decline  of  the  mosquitofish 
population  in  order  to  determine  the  optimal  densities  of  mosquitofish  to  use  in  rice  fields  to  control 
mosquito  populations.  Thus,  a  mathematical  model  that  accurately  describes  the  mosquitofish  popula¬ 
tion  would  be  beneficial  in  this  application,  as  well  as  in  other  problems  involving  population  dynamics 
and  age/size-structured  data. 

Based  on  data  collected  from  rice  fields,  a  reasonable  mathematical  model  would  have  to  predict 
two  key  features  that  are  exhibited  in  the  data:  dispersion  and  bifurcation  (a  unimodal  density  becomes 
a  bimodal  density)  of  the  population  density  over  time  [4,  9,  10].  The  growth  rate  distribution  model, 
developed  in  [4]  and  [9],  captures  both  of  these  features  in  its  solutions.  This  model  is  a  modification  of 
the  Sinko-Streifer  model  (used  for  modeling  age/size-structured  populations)  which  takes  into  account 
that  individuals  have  different  characteristics  or  behaviors.  The  standard  Sinko-Streifer  model  (SS)  for 
size-structured  mosquitofish  populations  is  given  by 


dv  d  ,  . 

ai  +  ax{9v) 

=  —/XV,  Xo  <  X  <  XI,  t  >  0 

(la) 

u(0,x) 

=  4>(x) 

(lb) 

g(t,x0)v(t,x0) 

=  fxlKMv{t,m 

(lc) 

J  Xo 

g{t,x  i) 

=  0. 

(Id) 

In  this  model,  v(t,  x )  represents  the  size  (given  in  numbers  per  unit  length)  or  population  density,  where 
t  represents  time  and  x  represents  the  length  of  the  mosquitofish.  The  growth  rate  of  the  individual 
mosquitofish  is  given  by  g(t,  x),  where 


dx 

dt 


g{t,x) 


for  each  individual,  so  that  all  mosquitofish  of  a  given  size  have  the  same  growth  rate  in  this  model.  We 
also  note  that  g(t,  x)  represents  the  mortality  rate  of  the  mosquitofish.  The  function  <l>(x)  represents  the 
initial  size  density  of  the  population,  while  K  represents  the  fecundity  kernel.  The  boundary  condition 
at  x  =  xo  is  the  recruitment,  or  birth,  rate,  while  the  boundary  condition  at  x  =  x\  =  £max  ensures  the 
maximum  size  of  the  mosquitofish  is  x\.  The  SS  model  cannot  be  used  as  is  to  model  the  mosquitofish 
population  because  it  does  not  predict  dispersion  or  bifurcation  of  the  population  in  time  under  biologi¬ 
cally  reasonable  assumptions  [4,  9].  However,  by  modifying  the  SS  model  so  that  the  individual  growth 
rates  of  the  mosquitofish  vary  across  the  population  (instead  of  being  the  same  for  all  individuals  in  the 
population),  one  obtains  a  model,  known  as  the  growth  rate  distribution  (GRD)  model,  which  does  in  fact 
exhibit  both  dispersal  in  time  of  the  mosquitofish  population  and  the  development  of  a  bimodal  density 
from  a  unimodal  density  (see  [9,  10]). 

In  the  growth  rate  distribution  (GRD)  model,  the  population  density  u(t ,  x:  P),  discussed  in  [4]  and 
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developed  in  [9],  is  actually  given  by 


u(t,x;P)=  /  v(t,  x\  g)dP(g).  (2) 

Jg 

Here  G  is  a  collection  of  admissible  growth  rates,  P  is  a  probability  measure  on  G,  and  v(t.  x:  g)  is 
the  solution  of  (1)  with  growth  rate  g.  This  model  assumes  that  the  population  is  actually  made  up  of 
collections  of  subpopulations,  where  individuals  in  the  same  subpopulation  have  the  same  growth  rate. 
Based  on  work  in  [9],  solutions  to  this  model  exhibit  both  dispersion  and  bifurcation  of  the  population 
density  in  time.  For  our  considerations  in  this  paper,  we  assume  that  the  admissible  growth  rates  have 
the  form 

g{x;  b ,  7)  =  6(7  -  x) 

for  Xq  <  x  <  7  and  zero  otherwise,  where  b  is  the  intrinsic  growth  rate  of  the  mosquitofish  and  7  =  x  \ 
is  the  maximum  size.  This  choice  is  based  on  work  in  [4],  where  the  idea  of  other  properties  related  to 
the  growth  rates  varying  among  the  mosquitofish  is  discussed.  Under  the  assumption  of  varying  intrinsic 
growth  rates  and  maximum  sizes,  we  in  fact  assume  that  b  and  7  are  random  variables  taking  values 
in  the  compact  sets  B  and  T,  respectively.  A  reasonable  assumption  is  that  both  are  bounded  closed 
intervals.  Thus  we  take 

G  =  {g(-,b, 7)|fee  H,7<e  r} 

so  that  G  is  also  compact  in,  for  example,  C[x 0,  X ]  where  X  =  max(r).  Then  'P(G')  is  compact  in  the 
Prohorov  metric  and  we  are  in  the  framework  outlined  above.  In  the  first  set  of  examples,  we  chose  a 
growth  rate  parameterized  by  the  intrinsic  growth  rate  b  with  7=1,  leading  to  a  one  parameter  family 
of  varying  growth  rates  g  among  the  individuals  in  the  population.  We  also  assume  here  that  //,  =  0  and 
I\  =  0  in  order  to  focus  on  only  the  distribution  of  growth  rates;  however,  distributions  could  just  as 
well  be  placed  on  //  and  K. 

We  next  introduce  two  different  approaches  that  can  be  used  in  the  inverse  problem  for  the  estimation 
of  the  distribution  of  growth  rates  of  the  mosquitofish.  The  first  approach,  which  has  been  discussed  and 
used  in  [9]  and  [10],  involves  the  use  of  delta  functions.  We  assume  that  the  probability  distributions 
VM  placed  on  the  growth  rates  are  discrete  corresponding  to  a  collection  GM  with  the  form  GM  = 
{gk}kLi  where  gk(x)  =  bk{  1  —  x),  for/c  =  1, . . .  ,M.  Here  the  {&;,.}  are  a  discretization  of  B.  For 
each  subpopulation  with  growth  rate  gk,  there  is  a  corresponding  probability  pk  that  an  individual  is  in 
subpopulation  k.  The  population  density  u(t,  x;  P)  in  (2)  is  then  approximated  by 

u{t,  x;  {pfc})  =  ^2  x ’  9k)Vk, 

k 

where  v(t,  x;  gk)  is  the  subpopulation  density  from  (1)  with  growth  rate  gk-  We  denote  this  delta  function 
approximation  method  as  DEL(M),  where  M  is  the  number  of  elements  used  in  this  approximation. 

While  it  has  been  shown  that  DEL(M)  provides  a  reasonable  approximation  to  (2),  a  better  approach 
might  involve  techniques  that  will  provide  a  smoother  approximation  of  (2)  in  the  case  of  continuous 
probability  distributions  on  the  growth  rates.  Thus,  as  a  second  approach,  we  chose  to  use  an  approxima¬ 
tion  scheme  based  on  piecewise  linear  splines.  Here  we  have  assumed  that  P  is  a  continuous  probability 
distribution  on  the  growth  rates.  We  approximate  P'  using  piecewise  linear  splines,  which  leads  to  the 
following  approximation  for  u(t,  x\  P)  in  (2): 


u(t,x;  {ak})  =  V'afc  /  v(t,x;g)lk 

JB 


(1 b)db , 


where  g(x\  b)  =  6(1  —  x),  Pk(b)  =  aklk.ib)  is  the  probability  density  for  an  individual  in  subpopulation 
k  and  Ik  represents  the  piecewise  linear  spline  functions.  This  spline  based  approximation  method  is 
denoted  by  SPL(M,N),  where  M  is  the  number  of  basis  elements  used  to  approximate  the  growth  rate 
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probability  distribution  and  N  is  the  number  of  quadrature  nodes  used  to  approximate  the  integral  in  the 
formula  above.  We  used  the  composite  trapezoidal  rule  for  the  approximation  of  these  integrals  [32], 

We  were  then  able  to  use  the  approximation  methods  DEL(M)  and  SPL(M,N)  in  the  inverse  problem 
for  the  estimation  of  the  growth  rate  distributions.  The  least  squares  inverse  problem  that  we  wish  to 
solve  is 


min  J(P ) 

P  GVm  (G) 


^2  \u(ti,Xj-,P)  -  u(ti,Xj)\2  (3) 

id 

^  ,  Xj ,  .P) )  2,u(f i ,  Xj ,  P^ju{ti ,  Xj )  T  (u(t % ,  Xj J J  , 

hi 


where  ii(t,x)  is  the  data  and  VM(G)  is  the  finite  dimensional  approximation  to  V(G).  When  using 
DEL(M),  the  finite  dimensional  approximation  VM (G)  to  the  probability  measure  space  V ( G )  is  given 
by 

vm(G)  =  Ip &v{G)\p' =  YJPkSak,  Y.P*  = 1 

l  k  k 

where  5gk  is  the  delta  function  with  an  atom  at  gt .  However,  when  using  SPL(M),  the  finite  dimensional 
approximation  VM  (G)  is  given  by 


>m(G)  =\p£  V{G)\P'  =  V  aklk(b),  V  ak  [ 

{  k  k  Jb 


lk(b)db  =  1 


f  hi  rt  her  mo  re,  we  are  able  to  note  that  this  least  squares  inverse  problem  (3)  becomes  a  quadratic 
programming  problem  [9,  10],  Letting  p  be  the  vector  that  contains  pk,  1  <  k  <  M,  when  using 
DEL(M)  or  ak,  1  <  k  <  M,  when  using  SPL(M,N),  we  let  A  be  the  matrix  with  entries  given  by 


v(ti,Xj\gk)v(ti,Xj\gm), 


b  the  vector  with  entries  given  by 


bk  =  -Y&iU'XMtiWgk), 
id 


and 

id 

where  1  <  k,m  <  M.  In  the  place  of  (3),  we  now  minimize 

F(  p)  =  pTAp  +  2pTb  +  c  (4) 

over  VM  (G).  We  note  when  using  DEL(M)  we  also  had  to  include  the  constraint 

^2pk  =  i, 
k 

while  when  using  SPL(M.N)  we  had  to  include  the  constraint 

ak  /  h{b)db  =  1. 

Jb 

However,  in  both  cases,  we  were  able  to  include  these  constraints  in  the  programming  of  these  two 
inverse  problems. 
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3  Numerical  Results 


We  next  present  and  discuss  several  computational  examples  based  on  simulated  data  and  performed 
in  MATLAB  involving  the  estimation  of  the  probability  distribution  P  on  the  growth  rates  of  the  size- 
structured  mosquitofish  population  in  order  to  demonstrate  the  validity  of  the  theoretical  results  discussed 
earlier.  The  particular  examples  used  here  are  based  on  previous  formulations  discussed  in  [4]  and  [9]. 

3.1  Convergence  of  DEL(M)  and  SPL(M,N)  Estimated  Probability  Distributions 

In  order  to  estimate  the  probability  distribution  P*  on  the  growth  rates  of  the  size-structured  mosquitofish 
population,  we  first  began  by  preparing  simulated  population  density  data  independent  of  the  two  approx¬ 
imation  methods  DEL(M)  and  SPL(M,N)  used  in  the  inverse  problem.  Since  we  were  only  concerned 
at  this  point  with  the  estimation  of  P*,  we  let  //  =  0  and  K  =  0  in  the  Sinko-Streifer  model.  We 
then  chose  a  true  distribution  P*  on  the  growth  rates  g(x),  where  g(x)  =  6(1  —  a;)  and  6  is  the  intrin¬ 
sic  growth  rate  of  the  mosquitofish.  More  specifically,  we  assumed  that  the  intrinsic  growth  rate  6  of 
the  mosquitofish  is  a  random  variable  with  distribution  P* .  This  allowed  us  to  generate  a  collection  of 
growth  rates  Gn  =  {gi,  g2,  ■  ■  ■ ,  gn},  where  we  took  n  =  200  in  order  to  get  a  good  approximation  of 

u(t,x-,P*)=  [  v(t,x;g)dP*(g) 

Jg 

and  Gn  C  G.  Our  simulated  data  ud(t,  x\  P*)  was  then  created  by  first  computing  the  solution  v(t,  x\  gi) 
of  the  Sinko-Streifer  model  for  each  individual  gi  using  the  method  of  characteristics  and  then  computing 


ud(t,x;P*)  =  [  v(t,x-,g)dP*(g) 

JGn 

using  the  Gauss-Legendre  integration  methodf  [32]).  We  took  50  uniformly  spaced  time  observations, 
where  the  time  interval  was  [0,  0.5].  The  range  of  size  values  (. xo ,  X\)  was  normalized  to  (0, 1)  and  50 
uniformly  spaced  size  values  were  used  in  this  range  for  our  simulated  data.  For  the  initial  size  density, 
we  used 

_  /  sin2(107ra),  0  <  x  <  0.1, 

1 ’  \  0,  0.1  <  X  <  1. 

For  our  first  set  of  data,  we  placed  a  “truncated”  Gaussian  distribution  on  6  with  mean  6  =  4.5  and 
variance  cr 2  =  0.25,  where  6  £  [6  —  3tr,  6  +  3cr] .  Using  this  range  of  values  for  the  intrinsic  growth 
rates  allows  us  to  capture  approximately  99%  of  the  intrinsic  growth  rates.  However,  to  ensure  that  this 
“truncated”  Gaussian  distribution  was  indeed  a  true  distribution,  we  had  to  scale  the  weights  used  in  the 
Gauss  -Fegendre  integration  method  to  ensure  that 


where  /(6;  6,  cr)  is  the  probability  density  function  corresponding  to  the  probability  distribution  P.  Using 
the  set  of  data  generated  with  this  “truncated”  Gaussian  distribution,  we  then  performed  the  inverse 
problem  using  SPF(M.N)  and  DEL(M)  to  find  estimates  of  the  “true”  probability  density  and  distribution 
(under  the  assumption  that  the  true  probability  distribution  P  was  unknown)  and  then  compared  these 
estimates  to  the  known  density  and  distribution.  As  discussed  earlier,  this  inverse  problem  is  simplified 
to  a  quadratic  programming  problem  when  using  least  squares,  where  we  minimize  pTAp  +  2pJ  b  +  c. 

Overall,  we  found  that  both  SPF(M,N)  and  DEF(M)  produced  good  approximations  of  the  true 
Gaussian  growth  rate  distribution  using  the  simulated  data;  however,  we  also  found  that  in  some  cases 
SPF(M,N)  produced  bad  estimates  of  the  probability  distribution  when  M  and  N  were  not  chosen  cor¬ 
rectly.  In  Figure  1,  we  have  the  results  from  the  inverse  problem  using  SPF(  15,200),  DEF(15),  and 
DEF(45).  We  see  from  the  results  in  Figure  1  that  the  spline  based  approximation  method  converges  in 
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Known  and  Estimated  Probability  Density  for  SPL(1 5,200)  Known  and  Estimated  Probability  Distribution  for  SPL(1 5,200) 


Known  and  Estimated  Probability  Density  for  DEL(15)  Known  and  Estimated  Probability  Distribution  for  DEL(15) 


Known  and  Estimated  Probability  Density  for  DEL(45) 


Known  and  Estimated  Probability  Distribution  for  DEL(45) 


Figure  1:  Estimates  of  probability  densities  and  distributions  for  true  gaussian  distribution  using 
a)SPL(15,200),  b)DEL(15),  c)DEL(45) 


distribution  much  faster  than  the  delta  function  approximation  method  for  a  given  M,  which  we  expect 
since  the  true  distribution  was  smooth  and  continuous.  For  M  =  15  and  N  =  200,  the  estimates  of  the 
probability  distribution  from  SPL(M,N)  have  converged  completely  to  the  true  distribution  whereas  the 
estimates  of  the  probability  distribution  from  DEL(M)  have  not  quite  converged.  As  M  is  increased, 
the  estimated  probability  distributions  from  DEL(M)  become  better  as  seen  in  the  results  for  M  =  45. 
We  also  note  that  along  with  convergence  in  distribution,  SPL(M,N)  also  provides  convergence  in  den¬ 
sity,  whereas  DEL(M)  does  not  provide  convergence  in  density.  This  may  be  attributed  largely  to  the 


7 


difference  in  the  constraints  for  these  two  methods.  SPL(M,N)  requires 

V'afc  /  lk(b)db  =  1, 

k  Jb 

wher epk(b)  =  aklk(b)  is  the  probability  density  for  an  individual  in  subpopulation  k  and  lk(b)  represents 
the  piecewise  linear  spline  “functions.”  On  the  other  hand,  DEL(M)  requires 

k 

where  pk  denotes  the  probability  coefficients  and  5qk  represents  the  delta  functions.  Since  the  true 
density  was  in  fact  smooth  and  continuous,  one  would  not  expect  convergence  in  density  when  using 
DEL(M)  because  it  is  much  cruder  in  its  approximation  of  (5).  We  remark  that  this  agrees  fully  with 
the  underlying  theory  for  convergence  of  distributions  in  the  Prohorov  metric  wherein  convergence  of 
densities  is  not  guaranteed. 

We  also  used  a  second  set  of  data  in  the  inverse  problem  with  a  “truncated"  Bi-Gaussian  distrib¬ 
ution  on  the  intrinsic  growth  rates  b.  Based  on  previous  work  in  [4],  this  type  of  distribution  leads  to 
data  which  exhibits  both  dispersion  and  bifurcation,  which  are  two  traits  observed  in  actual  mosquitofish 
field  data  [4,  9,  10].  In  order  to  obtain  a  Bi-Gaussian  distribution,  we  took  the  average  of  two  Gaussian 
distributions,  one  with  mean  bi  =  3.3  and  variance  <j\  =  0.492  and  the  second  with  mean  b 2  =  5.7  and 
variance  <j\  —  0.492.  The  simulated  data  was  prepared  in  the  same  way  as  described  above  except  with 
b  £  \b\  —  3ai,t>2  +  3ct2] .  The  results  for  the  inverse  problem  using  this  set  of  data  are  shown  in  Figure 
2  for  SPL(25,200),  DEL(25),  and  DEL(85).  Both  methods  do  a  good  job  of  estimating  the  Bi-Gaussian 
probability  distribution  with  the  simulated  data.  Again,  we  see  that  SPL(M,N)  converges  to  the  true  dis¬ 
tribution  faster  than  DEL(M).  However,  it  should  be  noted  that  more  basis  elements  (larger  values  of  M) 
were  required  in  both  methods  to  achieve  the  same  level  of  accuracy  in  approximating  the  Bi-Gaussian 
probability  distribution  in  comparison  to  the  Gaussian  distribution.  As  mentioned  earlier,  the  spline 
based  approximation  method  results  in  both  convergence  in  density  and  distribution  for  this  example 
as  well,  while  the  delta  function  approximation  method  only  results  in  convergence  in  distribution.  We 
see  that  significantly  more  basis  elements  are  required  for  full  convergence  of  the  approximations  from 
DEL(M)  to  the  true  Bi-Gaussian  probability  distribution  . 

In  the  examples  given  above,  SPL(M,N)  produced  good  approximations  to  both  the  Gaussian  and 
Bi-Gaussian  densities  and  distributions.  However,  as  we  stated,  obtaining  good  approximations  when 
using  SPL(M,N)  was  very  much  dependent  on  choosing  M  and  N  appropriately.  In  fact,  we  found 
if  M  and  N  were  not  chosen  carefully,  the  estimates  of  the  probability  distributions  from  the  inverse 
problem  using  SPL(M,N)  were  not  very  good  as  a  result  of  the  problem  becoming  unstable.  However, 
by  studying  the  condition  number  of  the  matrix  A  from  the  quadratic  programming  problem,  we  found 
that  this  behavior  could  be  readily  explained.  There  are  several  different  ways  in  which  the  condition 
number  k(A)  of  a  matrix  A  can  be  described  [32],  In  terms  of  the  norm  ||  •  ||  of  a  matrix,  we  have 
k( A)  =  ||  A-1 1|  •  ||  A|| .  The  condition  number  of  a  matrix  A  can  also  be  defined  as  the  ratio  of  the  largest 
singular  value  to  the  smallest  singular  value  in  the  singular  value  decomposition  of  the  matrix  A  (see 
[32]  for  further  discussion).  What  is  of  most  importance  is  the  information  one  learns  from  studying 
the  condition  number  of  a  matrix.  The  matrix  A  is  well-conditioned  (well-behaved)  if  k(A)  is  relatively 
small.  On  the  other  hand,  A  is  ill-conditioned  (ill-behaved)  if  k(A)  is  relatively  large.  Thus,  if  k(A) 
is  very  large,  meaning  A  is  ill-conditioned,  the  inverse  problem  becomes  unstable,  which  leads  to  poor 
approximations  of  the  probability  distribution  P.  We  note  that  the  discussion  here  is  limited  to  SPL(M,N) 
based  on  the  fact  that  the  matrix  A  for  the  spline  based  method  can  become  ill-conditioned  for  a  given  M 
based  on  the  number  of  quadrature  nodes  used  in  the  composite  trapezoidal  method  used  for  integration 
purposes  as  discussed  earlier.  However,  the  matrix  A  in  DEL(M)  does  not  change  for  a  given  M  due  to 
the  way  in  which  the  population  density  and  A  is  obtained  as  discussed  in  the  previous  section.  Since  we 
must  use  a  quadrature  method  to  compute  A  for  SPL(M,N),  we  expect  the  number  of  quadrature  nodes 
N  used  to  have  a  role  in  determining  k(A').  In  fact,  if  N  is  chosen  too  small  for  a  given  M,  meaning 


Known  and  Estimated  Probability  Density  for  SPL(25,200) 


Intrinsic  Growth  Rates,  b 


Known  and  Estimated  Probability  Distribution  for  SPL(25,200) 


Known  and  Estimated  Probability  Density  for  DEL(25) 


Intrinsic  Growth  Rates,  b 


Known  and  Estimated  Probability  Distribution  for  DEL(25) 


Known  and  Estimated  Probability  Density  for  DEL(85)  Known  and  Estimated  Probability  Distribution  for  DEL(85) 


Figure  2:  Estimates  of  probability  densities  and  distributions  for  true  bi-gaussian  distribution  using 
a)SPL(25,200),  b)DEL(25),  c)DEL(85) 


the  quadrature  rule  gives  a  very  coarse  approximation  to  the  actual  integration,  then  we  expect  k(A) 
to  become  larger,  the  problem  to  become  unstable,  and  the  estimates  of  the  probability  distribution  to 
become  poor. 

To  explore  the  validity  of  this  argument,  we  computed  the  condition  number  of  A  for  M  =  5,  25,  55,  75, 
and  95  and  N  =  50, 100, 150,200,250,  and  300  in  both  the  Gaussian  and  Bi-Gaussian  examples  that 
were  used  above.  The  resulting  condition  numbers  of  A  for  the  Gaussian  example  are  given  in  Table  1 . 
As  we  can  see  from  the  values  in  the  table,  when  M  =  5  and  25,  k(A)  is  relatively  small  for  all  listed 
values  of  N.  We  found  that  the  inverse  problem  produced  good  estimates  of  the  Gaussian  probability 
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M 

N 

~  «(A) 

5 

50,100,150,200,250,300 

9.35 

25 

50,100,150,200,250,300 

87 

55 

50 

1017 

55 

100,150,200,250,300 

lo3 

75 

50 

10l9 

75 

100,150,200,250,300 

105 

95 

50 

1018 

95 

100,150,200,250,300 

106 

Table  1 :  Computed  condition  numbers  of  A  for  gaussian  example  when  using  SPL(M,N) 


distribution  when  using  SPL(5,N)  and  SPL(25,N).  On  the  other  hand,  at  M  =  55,  we  began  to  see  some 
significant  differences  in  k(A),  as  shown  by  the  values  in  Table  1.  It  can  be  noted  that  for  M  =  55,  75, 
and  95,  k(A)  was  very  large  when  using  SPL(M,50).  However,  there  was  a  significant  decrease  in  the 
condition  number  of  A  when  using  SPL(M.N)  for  N  =  100, 150, 200, 250,  300  for  these  values  of  M. 
This  difference  in  condition  numbers  was  also  evident  in  the  estimates  of  the  probability  distribution 
from  the  inverse  problem.  For  M  =  55,  75,  and  95,  the  estimates  when  using  SPL(M,  50)  were  worse 
than  those  obtained  when  using  SPL(M,N)  for  all  other  listed  values  of  N.  We  note  the  estimates  of  the 
Gaussian  probability  distribution  became  better  as  the  value  of  N  was  increased  for  these  fixed  values  of 
M. 


M 

N 

~  «(A) 

5 

50,100,150,200,250,300 

13.23 

25 

50,100,150,200,250,300 

71 

55 

50 

1016 

55 

100,150,200,250,300 

330 

75 

50 

1017 

75 

100,150,200,250,300 

103 

95 

50 

1019 

95 

100,150,200,250,300 

103 

Table  2:  Computed  condition  numbers  of  A  for  bi-gaussian  example  when  using  SPL(M,N) 

For  the  Bi-Gaussian  example,  we  also  computed  n(A  )  for  the  same  values  of  M  and  N,  and  the  results 
in  this  case  are  given  in  Table  2.  The  results  obtained  when  using  a  “true”  Bi-Gaussian  probability  distri¬ 
bution  were  very  similar  to  the  results  obtained  when  using  a  “true”  Gaussian  probability  distribution.  At 
M  =  5  and  at  M  =  25,  k(A)  was  relatively  small  (13.23  and  71,  respectively,  for  each  value  of  N).  The 
inverse  problem  for  these  values  of  M  was  also  stable,  and  the  estimates  of  the  probability  distribution  in 
both  cases  were  good.  However,  when  M  =  55,  75,  and  95,  SPL(M,50)  results  in  large  condition  num¬ 
bers  for  A.  The  estimates  for  the  probability  distribution  using  SPL(M,50)  were  very  poor  in  comparison 
to  the  estimates  obtained  from  SPL(M,N)  for  M  =  55, 75, 95  and  N  =  100, 150,  200, 250, 300.  When 
SPL(M,N)  was  used  in  the  inverse  problem  for  the  estimation  of  the  Bi-Gaussian  probability  distribution 
for  M  =  55,  75,  95  and  for  values  of  N  greater  than  50,  we  saw  a  decrease  in  the  condition  numbers  of 
A  (see  values  in  Table  2)  corresponding  to  better  approximations  of  the  probability  distribution. 

To  summarize  these  computational  results,  in  both  the  Gaussian  and  Bi-Gaussian  case,  when  N  >  M, 
k(A)  was  relatively  low,  which  resulted  in  good  estimates  of  the  growth  rate  probability  distributions 
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when  using  SPL(M,N).  However,  when  using  SPL(M,N),  for  M  ~  N,  the  condition  number  of  A  was 
very  large,  resulting  in  an  ill -posed  inverse  problem  and  poor  estimates  of  the  probability  distributions. 
For  a  fixed  M,  we  observed  that  as  the  value  of  N  increased,  the  condition  number  of  A  decreased,  which 
agrees  with  results  in  [7]  and  [8],  Therefore,  we  have  shown  by  these  computational  efforts  that  we  can 
“regularize”  the  inverse  problem  when  using  SPL(M,N)  by  choosing  proper  ratios  of  M  and  N,  which 
is  known  as  “regularization  by  discretization  balance.”  By  using  a  finer  discretization  in  the  quadrature 
method  used  in  SPL(M.N),  we  were  able  to  obtain  better  results  in  the  inverse  problem  involving  the 
estimation  of  growth  rate  distributions. 

3.2  Sensitivity  of  DEL(M)  and  SPL(M,N)  Estimated  Probability  Distributions 

In  the  previous  subsection,  we  discussed  the  results  from  two  examples  using  simulated  data  in  the 
estimation  of  probability  distributions  P  on  the  growth  rates  of  size-structured  mosquitofish  populations. 
However,  data  collected  from  an  experiment  is  usually  corrupted  by  noise,  which  can  be  a  result  of  errors 
in  collecting  the  data,  errors  in  the  instruments  and  techniques  used,  etc.  Along  with  verifying  that  both 
SPL(M.N)  and  DEL(M)  produce  estimates  which  converge  in  distribution  when  simulated  data  with  no 
noise  is  used  in  the  inverse  problem,  we  also  wanted  to  be  able  to  make  some  remarks  about  the  sensitivity 
with  respect  to  noisy  data  of  the  estimates  of  the  probability  distributions  from  the  two  approximation 
methods.  Thus,  we  added  random  absolute  noise  to  the  simulated  data  used  in  the  previous  two  examples 
in  the  following  way: 

u(t,  x ;  P*)  =  u(t,  x ;  P*)  +  rj  ■  e, 

where  r]  represents  the  noise  level  constant  and  e  represents  normally  distributed  random  values  with 
mean  0  and  variance  1.  We  then  performed  the  inverse  problem  again  using  rj  =  0.005,0.025,0.05 
corresponding  to  1%,  5%  and  10%  absolute  error,  respectively,  for  both  the  Gaussian  and  Bi-Gaussian 
cases. 

We  begin  by  discussing  the  results  of  the  inverse  problem  using  the  simulated  data  with  a  true 
“truncated”  Gaussian  distribution  with  the  various  noise  level  constants.  Both  approximation  meth¬ 
ods,  SPL(M.N)  and  DEL(M),  performed  decently  in  the  inverse  problem  with  the  varying  percentages 
of  absolute  error.  With  only  1%  absolute  error,  both  DEL(M)  and  SPL(M,N),  with  M  and  N  chosen 
appropriately,  resulted  in  estimates  that  converged  to  the  true  growth  rate  probability  distribution  in  very 
much  the  same  manner  as  discussed  above  in  the  Gaussian  example  with  no  noise.  The  performance  of 
these  approximations  methods  was  not  greatly  effected  by  the  small  amount  of  noise  in  the  data.  With  a 
slightly  larger  percentage  of  absolute  error  in  the  simulated  data,  SPL(M,N)  and  DEL(M)  were  still  able 
to  produce  good  estimates  of  the  probability  distribution.  However,  the  results  from  the  inverse  prob¬ 
lem  using  r]  =  0.025  began  to  exhibit  some  small  effects  in  the  estimates  obtained  from  both  DEL(M) 
and  SPL(M,N).  For  example,  in  Figure  3,  the  approximated  probability  distributions  for  DEL(25)  and 
SPL(25,200)  with  5%  absolute  error  reveal  slightly  overestimated  front  tails.  Moreover,  SPL(25,200) 
with  5%  absolute  error  resulted  in  small  perturbations  in  the  approximated  density,  which  was  very 
smooth  when  no  noise  was  present  in  the  data.  With  very  noisy  data,  77  =  0.05,  SPL(M,N)  and  DEL(M) 
still  perform  fairly  well.  From  the  results  for  DEL(25)  and  SPL(25,200),  shown  in  Figure  3,  the  noisier 
data  resulted  in  only  slightly  poorer  approximations  in  comparison  to  those  obtained  with  5%  absolute 
noise.  Moreover,  the  larger  amount  of  noise  produced  more  oscillatory  behavior  in  the  approximated 
probability  densities  for  both  DEL(M)  and  SPL(M,N),  which  would  result  in  poorer  approximations  of 
the  corresponding  probability  distributions. 

We  also  tested  the  two  approximation  methods  for  sensitivity  to  error  in  the  Bi-Gaussian  data  with 
the  same  percentages  of  absolute  error.  The  results  obtained  from  the  inverse  problem  using  DEL(M) 
and  SPL(M,N)  with  noisy  data  with  a  “true”  Bi-Gaussian  distribution  were  very  similar  to  those  obtained 
when  using  noisy  data  with  a  “true”  Gaussian  distribution.  Overall,  the  estimated  probability  distribu¬ 
tions  from  DEL(M)  and  SPL(M.N)  were  not  largely  affected  by  the  various  amounts  of  noise  added  to  the 
simulated  data.  Both  methods  were  able  to  produce  good  approximations  of  the  probability  distributions 
in  the  presence  of  noise.  With  1%  absolute  error  in  the  data,  the  estimates  of  the  growth  rate  distributions 
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from  DEL(M)  and  SPL(M,N)  did  not  change  significantly  from  the  estimates  obtained  when  there  was  no 
noise  in  the  data.  We  were  still  able  to  obtain  convergence  in  distribution  (with  faster  convergence  when 
using  SPL(M,N))  with  both  approximation  methods.  When  the  percentage  of  absolute  error  in  the  data 
was  5%,  DEL(M)  and  SPL(M.N)  still  performed  well  by  producing  good  estimates  of  the  Bi-Gaussian 
probability  distribution.  The  small  amount  of  noise  had  some  small  effect  on  the  estimates  as  seen  in 
Figure  4;  in  fact,  we  see  that  the  front  tails  in  both  the  estimate  from  DEL(35)  and  SPL(35,200)  for 
77  =  0.025  are  slightly  largely  than  the  tails  for  the  “true”  distribution.  When  even  more  noise  is  present 
in  the  data,  the  estimated  probability  distributions  became  slightly  poorer  for  a  fixed  M  and  N.  In  Fig¬ 
ure  4,  the  estimated  probability  densities  and  distributions  are  shown  for  DEL(35)  and  SPL(35)  for  data 
with  10%  absolute  error.  It  is  clear  from  these  plots  that  the  estimates  from  DEL(M)  and  SPL(M,N)  are 
indeed  affected  by  the  noisier  data.  As  in  the  Gaussian  case,  we  noticed  some  oscillatory  behavior  in  the 
estimated  probability  densities  from  these  two  approximation  methods  as  the  amount  of  noise  present  in 
the  data  increased.  Moreover,  the  front  tails  in  the  estimated  probability  distributions  are  overestimated, 
whereas  the  end  tails  are  underestimated  for  both  DEL(35)  and  SPL(35,200). 

While  we  were  still  able  to  obtain  convergence  in  distribution  using  both  DEL(M)  and  SPL(M.N), 
with  M  and  N  chosen  carefully,  our  computational  results  showed  that  as  more  noise  was  added  to  both 
the  Gaussian  and  Bi-Gaussian  data,  the  estimates  of  the  growth  rate  distributions  from  both  methods 
became  relatively  poorer  for  a  fixed  M  and  N.  SPL(M,N)  produced  probability  distribution  estimates 
that  converged  faster  in  distribution  than  DEL(M)  using  both  data  with  noise  and  without  noise.  This 
behavior,  as  stated  earlier,  was  expected  since  the  “true”  probability  distributions  in  these  numerical 
examples  are  smooth  and  continuous.  Although  some  of  our  computational  evidence  for  SPL(M,N) 
also  exhibited  convergence  in  density  as  well  as  distribution,  convergence  in  density  is  generally  not 
guaranteed  and  is  not  supported  by  the  theory  underlying  this  work  [1,  2,  3,  12], 
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Known  and  Estimated  Probability  Density  for  DEL(25)  -  5%  Absolute  Error 


Known  and  Estimated  Probability  Distribution  for  DEL(25)  -  5%  Absolute  Error 


Known  and  Estimated  Probability  Density  for  DEL(25)  -  10%  Absolute  Error 


Known  and  Estimated  Probability  Distribution  for  DEL(25)  -  10%  Absolute  Error 


Known  and  Estimated  Probability  Density  for  SPL(25,200)  -  5%  Absolute  Error  Known  and  Estimated  Probability  Distribution  for  SPL(25,200)  -  5%  Absolute  Error 


Known  and  Estimated  Probability  Density  for  SPL(25,200)  -  10%  Absolute  Error  Known  and  Estimated  Probability  Distribution  for  SPL(25,200)  -  10%  Absolute  Error 


Figure  3:  Estimates  of  probability  densities  and  distributions  for  true  gaussian  distribution  using  a)DEL(25) 
with  5%  absolute  error,  b)DEL(25)  with  10%  absolute  error,  c)SPL(25,200)  with  5%  absolute  error, 
d)SPL(25,200)  with  10%  absolute  error 
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Known  and  Estimated  Probability  Density  for  DEL(35)  -  5%  Absolute  Error  Known  and  Estimated  Probability  Distribution  for  DEL(35)  -  5%  Absolute  Error 


Known  and  Estimated  Probability  Density  for  DEL(35)  -  10%  Absolute  Error  Known  and  Estimated  Probability  Distribution  for  DEL(35)  -  10%  Absolute  Error 


Known  and  Estimated  Probability  Density  for  SPL(35,200)  -  5%  Absolute  Error  Known  and  Estimated  Probability  Distribution  for  SPL(35,200)  -  5%  Absolute  Error 


Known  and  Estimated  Probability  Density  for  SPL(35,200)  -  1 0%  Absolute  Error  Known  and  Estimated  Probability  Distribution  for  SPL(35,200)  -  1 0%  Absolute  Error 


Figure  4:  Estimates  of  probability  densities  and  distributions  for  true  bi-gaussian  distribution  using 
a)DEL(35)  with  5%  absolute  error,  b)DEL(35)  with  10%  absolute  error,  c)SPL(35,200)  with  5%  absolute 
error,  d)SPL(35,200)  with  10%  absolute  error 
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3.3  Sensitivity  Analysis  of  Approximation  Methods  with  Respect  to  Noisy  Data 

We  next  considered  the  sensitivity  of  the  two  approximation  methods,  DEL(M)  and  SPL(M,N),  with 
respect  to  noisy  data  in  order  to  make  some  comments  about  the  uncertainty  associated  with  the  estimated 
growth  rate  distributions.  We  cannot  physically  observe  the  entire  population;  however,  we  can  include 
some  measures  (e.g.,  confidence  intervals)  on  the  uncertainty  of  the  estimates  obtained  from  the  two 
approximation  methods  when  using  only  a  sample  from  the  population.  We  follow  the  standard  statistical 
framework  for  asymptotic  (as  sample  size  n  — >  oo)  distributions  for  OLS  estimators  [19,  22,  25], 

We  began  by  considering  the  following  statistical  model  for  the  mosquitofish  population  density 

y (%)  =  0)  +Cj,  j  =  1, . . . , n, 

where  {xj}  corresponds  to  (ti,xm),l  =  1, . . . ,  nt,  m  =  1 , . ..,  nx  pairs  ( nt  corresponds  to  the  number 
of  time  values,  nx  corresponds  to  the  number  of  size  values  used,  and  n  =  nf  ■  nx).  We  also  note  that 

E[Yj]  =  f(xj,90)  and  var  [Yj]  =  <j%, 

where  9q  ~  9  =  {pk}kLi  when  using  DEL(M)  and  9q  ss  9  =  {ak}^Li  when  using  SPL(M,N)  and 
ej  ~  A/"(0,  Uq).  Here,  9q  represents  the  “true”  parameter  value  and  (Tq  represents  the  “true”  variance 
for  the  system  (which  are  generally  not  known)  and  the  9  are  the  parameters  to  be  estimated  for  90. 
Furthermore,  we  note  that 


M 

f(xj,9 0)  «  ^ ~2pkv(xj-,gk ), 
fc= t 

where  gk{x)  =  j  —  x),  when  considering  DEL(M),  while 

M 

f(xj,60)  ~  Qfc  /  v{xf,  g)lk(b)db 
fc=i  Jb 


(6) 


(7) 


when  considering  SPL(M,N). 

As  stated  previously,  our  goal  is  to  quantify  the  uncertainty  associated  with  the  estimated  growth  rate 
distributions  from  the  methods  DEL(M)  and  SPL(M.N).  We  will  make  statements  about  the  reliability  of 
our  estimates  based  upon  standard  errors.  That  is,  we  will  compute  confidence  intervals  corresponding 
to  the  estimated  growth  rate  distributions.  We  note  as  n  — >  oo, 

9ols(Y)  ~  Am (0o,  <Xo[XT (90)X (9q)]  1)  =A/m(0 o,^o), 
where  X(9)  =  j^{9)  =  Fg{9)  is  the  n  x  M  sensitivity  matrix  with  elements 


Xjk{9) 


df{xj,0) 

d9k 


We  used  the  ordinary  least  squares  estimator: 


n 

minimize  in  9  ^(Yj  -  f(xj,9))2. 
j— i 

The  estimates  9  for  the  growth  rate  distribution  minimize 

n 

.7=1 
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for  a  particular  realization  or  data  set  and  result  from  a  quadratic  programming  problem  as  dis¬ 
cussed  earlier.  As  we  have  remarked,  that  along  with  0f}  being  unknown,  rru  is  also  usually  unknown. 
Thus,  in  order  to  compute  the  standard  errors  associated  with  9 ,  we  must  also  estimate  op-  We  use  the 
following  estimate  for  ctq  : 


a 


1 

n  —  M 


IL  2 

^2(yj  ~  ffaidj)  ■ 
0= 1 


We  then  use  these  estimates  to  compute  an  estimate  of  the  covariance  matrix  Eg  : 


So 


E  =  (T2 


XT(6)X{9) 


Moreover,  we  note  that  the  standard  errors  for  the  growth  rate  distributions  estimates  Of,,  are  given  by 


SE(4) 


Before  we  present  some  results,  we  want  to  make  a  few  comments  about  the  covariance  matrix  E. 
We  note  that  determining  <r2  is  very  straightforward  once  we  have  6.  We  simply  multiply  the  residual  at 
0  by  in  order  to  compute  er2.  We  must  also  compute  X(0),  which  can  be  more  difficult  in  general 
when  dealing  with  nonlinear  systems.  However,  as  we  noted  earlier,  the  elements  of  the  n  x  M  matrix 
X  (9)  are  given  by 


Xjk{9) 


df(xj,9) 

d9k 


These  are  actually  the  sensitivity  elements  associated  with  this  system.  We  note  for  DEL(M)  the  entries 
in  the  sensitivity  matrix  X(9)  are  given  by 


/n\  df(xj,9)  , 

Xjk{0)  =  dQk —  =  v{xj,gk), 

where  9k(pk  or  ak)  is  the  probability  parameter  associated  with  growth  rate  gk{x)  =  bk( j  —  x).  For 
SPL(M,N),  the  entries  in  the  sensitivity  matrix  X  (9)  are  given  by 

Xjk(9)  =  6 ^  =  J^v{xj,g)lk(b)db. 

Since  both  (6)  and  (7)  are  linear  in  0,  then  computing  X  in  this  case  is  also  very  straightforward.  Fur¬ 
thermore,  the  asymptotic  distributional  results  given  earlier  are  exact  in  this  case  (see  [19,  22,  25])  as 
opposed  to  only  being  an  approximation  when  f(xj,  9)  is  nonlinear  in  9. 

We  present  next  some  findings  on  the  sensitivity  of  DEL(M)  and  SPL(M,N)  using  simulated  data 
generated  with  a  “true”  Bi-Gaussian  distribution.  Table  3  displays  the  estimated  probability  density 
values  and  corresponding  confidence  intervals  for  DEL(9)  and  SPL(9,200)  in  the  presence  of  5%  and 
10%  absolute  error.  In  Figure  5,  we  see  the  confidence  intervals  corresponding  to  the  estimated  growth 
rate  distributions  with  a  =  0.05  for  DEF(9)  and  SPF(9,200)  with  simulated  data  with  both  5%  and  10% 
absolute  error.  The  endpoints  of  the  confidence  intervals  are  given  by 

d±h_a/2sE{e), 

where  ti_a/2  is  a  distribution  value  that  is  determined  by  the  level  of  significance  that  is  chosen  [24]. 
After  a  level  of  significance  is  chosen,  we  determine  the  corresponding  t \-a/2  value  from  a  statistical 
table  for  the  t-distribution.  We  chose  to  use  a  =  0.05  for  a  significance  level  of  95%,  which  corresponds 
to  t\_a/2  =  1.96  when  the  number  of  degrees  of  freedom  is  large,  i.e.,  n  >  30.  Based  on  the  confidence 
intervals,  we  can  make  statements  about  the  estimation  procedure  constructed  from  a  realization  of  Y. 
If  the  resulting  confidence  intervals  are  relatively  large  in  relation  to  9 .  then  we  are  not  very  confident 
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DEL(9)  -  5% 

DEL(9)  -  10% 

SPL(9,200)  -  5% 

SPL(9,200)  -  10% 

Pi 

0.1724  ±0.0191 

0.1750  ±  0.0196 

0.0203  ±0.0130 

0.0460  ±  0.0252 

P2 

0.1506  ±0.0170 

0.1487  ±0.0174 

0.0459  ±0.0075 

0.0452  ±  0.0145 

P3 

0.1665  ±0.0149 

0.1644  ±  0.0152 

0.2447  ±  0.0069 

0.2404  ±  0.0133 

P4 

0.1432  ±0.0128 

0.1386  ±0.0131 

0.2651  ±0.0063 

0.2506  ±0.0121 

P5 

0.0948  ±0.0115 

0.1005  ±  0.0117 

0.0895  ±0.0057 

0.1121  ±0.0110 

Pd 

0.1084  ±0.0098 

0.1058  ±0.0100 

0.2738  ±0.0051 

0.2593  ±  0.0098 

P7 

0.0953  ±0.0090 

0.0946  ±  0.0092 

0.2476  ±  0.0045 

0.2509  ±0.0088 

P8 

0.0442  ±  0.0080 

0.0482  ±  0.0082 

0.0338  ±  0.0038 

0.0397  ±  0.0075 

P9 

0.0246  ±  0.0072 

0.0242  ±  0.0073 

0.0000  ±0.0056 

0.0038  ±0.0109 

Table  3:  Estimated  probability  density  values  and  confidence  intervals  for  true  bi-gaussian  distribution  for 
DEL(9)  and  SPL(9,200)  with  5%  and  10%  absolute  error 


about  the  estimation  procedure  used  to  estimate  do .  In  the  results  presented  here,  we  can  state  that  we  are 
95%  confident  that  intervals  constructed  using  the  estimation  procedures  with  DEL(M)  and  SPL(M,N) 
would  “cover”  0q.  We  note  that  for  the  fixed  value  M  =  9  the  confidence  intervals  corresponding  to 
a  =  0.05  when  using  DEL(M)  are  relatively  small  in  relation  to  the  estimated  9  for  data  with  both 


Known  and  Estimated  Probability  Density  with  Confidence  Intervals  Known  and  Estimated  Probability  Density  with  Confidence  Intervals 


Known  and  Estimated  Probability  Density  with  Confidence  Intervals 


Intrinsic  Growth  Rates,  b 


Known  and  Estimated  Probability  Density  with  Confidence  Intervals 


Intrinsic  Growth  Rates,  b 


Figure  5 :  Estimates  of  probability  densities  with  confidence  intervals  given  a  true  bi-gaussian  distribution 
using  a)DEL(9)  with  5%  absolute  error,  b)DEL(9)  with  10%  absolute  error,  c)SPL(9,200)  with  5%  absolute 
eiTor,  d)SPL(9,200)  with  10%  absolute  error 
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5%  and  10%  absolute  error.  Moreover,  we  note  from  Figure  5  that  the  resulting  confidence  intervals 
for  DEL(9)  with  5%  and  10%  absolute  error  are  approximately  the  same.  Thus,  for  M  =  9,  the  delta 
function  approximation  method  appears  to  be  insensitive  to  noisy  data.  In  comparison,  we  note  for  this 
fixed  value  M  =  9  when  using  SPL(M,N)  the  resulting  confidence  intervals  are  relatively  larger  for  data 
with  10%  absolute  error  in  comparison  to  those  for  data  with  5%  absolute  error.  Thus,  the  confidence 
associated  with  the  estimator  procedure  based  on  SPL(M,N)  appears  to  decrease  as  the  percentage  of 
absolute  error  increases.  However,  the  confidence  intervals  are  still  relatively  small  in  relation  to  the 
estimates  9.  Based  on  these  results,  we  would  infer  that  for  this  fixed  value  of  M,  the  spline  based 
approximation  method  appears  to  be  very  slightly  sensitive  to  very  noisy  data. 


Known  and  Estimated  Probability  Density  with  Confidence  Intervals 


Intrinsic  Growth  Rates,  b 


Known  and  Estimated  Probability  Density  with  Confidence  Intervals 


Known  and  Estimated  Probability  Density  with  Confidence  Intervals 


Intrinsic  Growth  Rates,  b 


Known  and  Estimated  Probability  Density  with  Confidence  Intervals 


Figure  6:  Estimates  of  probability  densities  with  confidence  intervals  given  a  true  bi-gaussian  distribution 
using  a)DEL(15)  with  5%  absolute  error,  b)DEL(15)  with  10%  absolute  error,  c)SPL(15,200)  with  5% 
absolute  error,  d)SPL(15,200)  with  10%  absolute  error 

We  note  that  the  estimated  probability  density  values  and  corresponding  confidence  intervals  for 
DEL(15)  and  SPL(15,200)  in  the  presence  of  5%  and  10%  absolute  error  are  given  in  Table  4.  In 
Figure  6,  we  see  the  confidence  intervals  corresponding  to  the  estimated  growth  rate  distributions  with 
a  =  0.05  for  DEL(15)  and  SPL(15,200)  with  simulated  data  with  both  5%  and  10%  absolute  error.  The 
endpoints  of  the  confidence  intervals  are  constructed  in  the  same  way  as  discussed  above.  We  can  again 
state  that  we  are  95%  confident  that  intervals  constructed  using  the  estimation  procedures  with  DEL(M) 
and  SPL(M.N)  would  “cover”  9q-  From  Figure  6  we  see  that  the  confidence  intervals  corresponding  to 
a  =  0.05  when  using  DEF(M)  for  M  =  15  are  relatively  small  in  relation  to  0  for  data  with  5%  and 
10%  absolute  error  much  like  those  computed  for  M  =  9.  We  also  see  in  this  case  that  the  confidence 
intervals  are  approximately  the  same  for  both  sets  of  data.  We  arrive  at  the  same  conclusion  for  M  =  15 
as  we  did  for  M  =  9;  that  is,  the  delta  function  approximation  method  appears  to  be  insensitive  to 
noisy  data.  We  now  look  at  the  results  for  M  =  15  when  using  SPF(M,N)  with  data  with  5%  and  10% 
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DEL(15)  -  5% 

DEL(15)  -  10% 

SPL(  15,200)  -  5% 

SPL(  15,200)  -  10% 

Pi 

0.0788  ±0.0141 

0.0826  ±0.0148 

0.0335  ±  0.0192 

0.0650  ±0.0376 

P2 

0.0689  ±0.0132 

0.0714  ±0.0138 

0.0180  ±0.0110 

0.0197  ±0.0241 

P3 

0.0771  ±0.0123 

0.0774  ±0.0129 

0.0823  ±0.0104 

0.0899  ±  0.0204 

P4 

0.0983  ±0.0115 

0.0896  ±0.0121 

0.1752  ±  0.0099 

0.1632  ±0.0193 

P5 

0.1123  ±0.0107 

0.1151  ±0.0113 

0.2735  ±  0.0093 

0.2738  ±0.0183 

Pa 

0.1064  ±0.0101 

0.1053  ±0.0105 

0.2725  ±  0.0088 

0.2531  ±0.0173 

P7 

0.0768  ±0.0091 

0.0779  ±  0.0095 

0.1805  ±  0.0083 

0.1803  ±0.0162 

P8 

0.0579  ±0.0085 

0.0642  ±  0.0089 

0.1033  ±0.0078 

0.1374  ±0.0153 

P9 

0.0748  ±  0.0082 

0.0700  ±  0.0086 

0.1854  ±0.0073 

0.1692  ±0.0144 

PlO 

0.0800  ±0.0071 

0.0798  ±  0.0074 

0.2783  ±  0.0068 

0.2685  ±0.0134 

Pll 

0.0765  ±  0.0063 

0.0780  ±  0.0066 

0.2783  ±  0.0062 

0.2879  ±0.0121 

Pl2 

0.0447  ±  0.0054 

0.0446  ±  0.0056 

0.1811  ±0.0061 

0.1712  ±0.0119 

Pl3 

0.0230  ±  0.0045 

0.0231  ±0.0047 

0.0663  ±  0.0053 

0.0793  ±0.0103 

Pl4 

0.0123  ±0.0043 

0.0091  ±0.0045 

0.0147  ±0.0049 

0.0022  ±0.0096 

Pl5 

0.0121  ±0.0053 

0.0120  ±0.0055 

0.0025  ±  0.0078 

0.0261  ±0.0152 

Table  4:  Estimated  probability  density  values  and  confidence  intervals  for  true  bi-gaussian  distribution  for 
DEL(15)  and  SPL(15,200)  with  5%  and  10%  absolute  error 


absolute  error.  We  also  see  for  M  =  15  as  we  saw  for  M  =  9  that  the  resulting  confidence  intervals 
are  larger  when  the  data  is  noisier.  As  discussed  in  the  case  for  M  =  9,  the  spline  based  approximation 
method  also  appears  to  be  slightly  sensitive  to  noisy  data.  To  summarize,  we  note  based  on  the  standard 
error  analysis  discussed  in  this  section  and  computational  results  (those  presented  here  as  well  as  those 
obtained  for  M  =  5)  we  can  conclude  that  DEL(M)  appears  to  be  insensitive  to  noisy  data.  Moreover,  we 
can  state  that  we  are  confident  about  the  estimated  growth  rate  distributions  obtained  using  this  method. 
We  also  conclude  that  SPL(M,N)  appears  to  be  relatively  insensitive  to  noisy  data.  Furthermore,  we 
would  feel  certain  about  the  estimated  growth  rate  distributions  obtained  using  SPL(M,N)  with  data 
with  small  amounts  of  noise;  however,  we  would  infer  that  larger  amounts  of  noise  in  the  data  would 
lead  to  larger  confidence  intervals  and  less  certainty  in  the  associated  estimated  growth  rate  distributions 
obtained  using  SPL(M,N). 

4  Computational  Results  for  Inverse  Problems  with  Field  Data 

In  this  section,  we  will  present  and  discuss  the  results  of  the  inverse  problem  for  the  estimation  of  growth 
rate  distributions  using  the  delta  function  approximation  method,  the  spline  based  approximation  method, 
and  a  parameterized  ordinary  least  squares  method.  We  use  field  data  collected  from  rice  paddies  in  the 
place  of  the  simulated  data.  Since  the  actual  growth  rate  distribution  of  the  mosquitofish  observed  in  the 
experiment  is  unknown,  we  must  compare  the  field  data  to  the  estimated  population  data  produced  by 
the  estimated  growth  rate  distribution  from  each  of  these  methods  in  order  to  compare  the  efficacy  of 
these  methods. 

In  these  numerical  simulations,  we  assume  that  the  growth  rate  of  the  mosquitofish  is  now  para¬ 
meterized  by  both  the  intrinsic  growth  rate  b  and  the  maximum  size  7,  where  g(x)  =  b( 7  —  x).  The 
collection  of  growth  rates  for  the  DEL  method  is  now  given  by  G  =  {7,7, } .  for  j  =  1, . . . ,  Mi,  and 
k  =  1, . .  • ,  M-2,  where  gjj-  =  (7(7/0  —  x)  with  {bj}  and  {7/0}  the  discretizations  of  B  and  T,  respec¬ 
tively.  As  earlier  stated,  in  the  GRD  model  mosquitofish  are  grouped  together  in  the  same  subpopulation 
if  they  have  the  same  growth  rate  gjk-  We  assume  here,  as  we  also  assumed  in  our  earlier  computations. 
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that  n  =  0  and  K  =  0,  so  that  we  can  focus  on  the  growth  rate  distribution  only  (mortality  and  fecundity 
were  not  thought  to  be  important  features  of  the  experimental  data  of  [4,  10]). 

The  field  data  that  we  are  using  in  the  inverse  problem  was  collected  in  an  experiment  described 
in  [10].  On  June  28,  1992,  four  rice  paddies  were  stocked  with  mosquitofish.  In  order  to  measure 
emigration,  an  outflow  trap  was  placed  on  each  paddy.  Fifteen  traps  were  used  per  paddy,  and  weekly 
measurements  were  taken.  The  length  of  the  mosquitofish  range  from  0  to  40  mm,  with  the  mosquitofish 
being  grouped  into  size  classes  of  2  mm  for  a  total  of  20  size  classes.  The  data  for  Day  195,  Day  202, 
Day  209,  and  Day  216  are  used  in  these  simulations  (see  Figure  7).  We  define  the  size  distribution 
frequency  for  size  class  i  as  f,  =  n„Vi /Nm ,  where  nm  *  is  the  number  of  mosquitofish  measured  in  size 
class  i  and  Nrn  is  the  total  number  of  mosquitofish  measured.  The  total  population  of  mosquitofish  is 
divided  into  512  subpopulations.  We  note  that  the  discretizations  for  the  intrinsic  growth  rates  b  and  the 
maximum  sizes  7  are  defined  as 

bj  =  0.2  +  •  4.8  •  (j  —  1),  j  =  1,2, ...  ,32 

16  22  24 

71  ~~  38’  72  ~  38’  73  -  38 

16  1  22 

7 k  =  —  + - (fc-1),  fc  =  4, . . . ,  16. 

1  38  15  38  v  ; 

The  Day  195  data  is  interpolated  and  used  an  approximation  for  the  initial  size  density  $(x).  Since  this 
data  set  is  used  as  an  approximation  for  (\>(x).  it  cannot  be  used  in  the  estimation  of  the  growth  rate 
distributions.  Therefore,  we  are  left  with  only  three  data  sets  to  use  in  the  inverse  problem. 

We  introduced  the  delta  function  approximation  method,  DEL(M),  earlier  when  the  growth  rate  is 
parameterized  by  b  only.  Since  we  are  now  considering  a  growth  rate  parameterized  by  b  and  7,  the 
approximated  population  density  for  u(t,  x:  P)  in  (2)  is  now  given  by 

u{t,  x;  { pjk })  =  ^2  v(t,  x;  9jk)Pjk, 

j,k 

where  v(t,  x;  gjk)  is  the  subpopulation  density  from  (1)  with  growth  rate  g3k  and  p3k  is  the  corresponding 
probability  that  an  individual  has  growth  rate  7.7..  We  will  now  use  the  notation  DEL(A/i ,  M2)  to  denote 
the  delta  function  approximation  method,  where  Mi  is  the  number  of  intrinsic  rates  bj  and  M2  is  the 
number  of  maximum  sizes  77,.  used  in  the  approximation. 

The  spline  based  approximation  method,  SPL(M,N),  was  also  introduced  earlier  for  the  one  para¬ 
meter  family  of  growth  rates.  For  the  two  parameter  family  of  growth  rates  that  we  are  now  using,  the 
approximated  population  density  for  u(t,  x:  P)  in  (2)  is  given  by 


u(t,x\  {ajfc})  =  /  /  v(t,x\g)lj(b)lk('y)d'ydb. 

j,k  JB  JV 


where  g  =  g[x\  b,  7)  =  6(7  —  x)  and  p3k(b ,  7)  =  a3klj(b)lk{ 7)  is  the  probability  density  for  individuals 
in  population  subgroup  jk  with  l3,  Ik  piecewise  linear  spline  functions.  The  notation  that  we  employ 
here  is  SPL(M! ,  N-\ ,  M2 ,  N2),  where  A7  |  and  M2  are  the  number  of  basis  elements  used  to  approximate 
the  growth  rate  probability  distribution  with  respect  to  b  and  7,  and  Ni  and  N2  represent  the  number 
of  quadrature  nodes  used  in  the  composite  trapezoidal  rule  [16]  for  double  integrals  to  approximate  the 
integral  in  the  expression  above  with  respect  to  b  and  7. 

We  next  present  the  results  of  the  approximation  methods  DEL(Mi ,  M2)  and  S  PL( M\ ,  N  \ ,  M2 ,  N2), 
which  we  used  in  the  inverse  problem  for  the  estimation  of  the  growth  rate  distribution  for  the  field  data. 
The  addition  of  a  second  parameter  in  the  growth  rate  g  of  the  mosquitofish  does  not  change  the  fact  that 
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the  least  squares  inverse  problem  that  we  want  to  solve 

min  J(P)  =  Y''  \u(ti,Xj\P)  -  u(ti,Xj)\2 

pe-pM1xM2rG)  J 

—  ^  tCj.  P))  2u(ti,  Xj,  P)u(ti,  Xj)  T  (u(ti,Xj))  , 

ij 

where  u(i,  at)  is  the  data  and  PMl  xM2(G)  is  the  finite  dimensional  approximation  to  V(G),  simplifies 
to  a  quadratic  programming  problem  for  an  appropriately  defined  F(p)  =  pTAp  +  2p7  b  +  c. 

In  Figure  7,  we  have  the  results  from  the  inverse  problem  using  DEL(32,16).  These  results  were 
obtained  in  514.3600  seconds,  and  the  corresponding  residual  J  =  8.3169  x  10-4.  We  see  from  the 
results  shown  in  Figure  7  that  the  estimated  population  is  a  good  fit  to  the  field  data.  The  two  key 
features  of  the  data,  dispersion  and  bifurcation,  are  both  exhibited  in  the  estimated  population.  The 
corresponding  estimated  probability  density  and  distribution  are  shown  in  Figure  8.  While  no  useful 
information  can  be  obtained  from  Figure  8a,  the  estimated  probability  distribution  in  Figure  8b  appears 
to  be  Bi-Gaussian  in  both  b  and  7,  which  is  expected  based  on  results  from  [9]  and  [10], 


Field  Data  -  Day  195 


Field  Data  versus  Estimated  Population  -  Day  202 


Figure  7:  Field  data  versus  estimated  population  for  DEL(32,16) 

The  best  results  obtained  using  SPL(Ml5  N\,  M2,  IV2)  for  the  estimation  of  the  growth  rate  dis¬ 
tribution  of  the  field  data  are  shown  in  Figure  9  for  SPL(5,35,9,35).  We  note  that  the  results  ob¬ 
tained  using  SPL(5,35,5,35)  and  SPL(5,35,7,35)  were  approximately  the  same  as  those  obtained  us¬ 
ing  SPL(5,35,9,35)  for  Day  202  and  Day  209.  However,  the  estimated  population  data  obtained  us¬ 
ing  SPL(5,35,9,35)  gave  a  much  better  fit  to  the  data  for  Day  216  than  the  results  obtained  using 
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0.3^, 


Figure  8:  a)  Estimated  probability  density  and  b)  estimated  probability  distribution  for  DEL(32,16) 


SPL(5,35,5,35)  and  SPL(5,35,7,35).  The  corresponding  residual,  J,  for  SPL(5,35,9,35)  is  0.0054,  and 
these  results  were  obtained  in  1.8822  x  103  seconds,  or  approximately  32  minutes.  In  comparison  to 
the  estimated  population  data  produced  by  the  estimated  growth  rate  distribution  from  DEL(32,16),  the 
estimated  population  data  produced  by  the  estimated  growth  rate  distribution  from  SPL(5,35,9,35)  does 
not  give  as  good  a  fit  to  the  field  data,  as  is  seen  in  Figure  9.  The  estimated  probability  density  and 
distribution  are  shown  in  Figure  10.  The  resulting  estimated  probability  distribution  appears  to  be  Bi- 
Gaussian  in  7.  In  contrast  to  the  results  obtained  in  the  previous  examples  with  simulated  data,  the  delta 
function  approximation  method  does  a  better  job  of  fitting  the  given  field  data  in  a  more  efficient  way  in 
comparison  to  the  spline  based  approximation  method. 

Since  the  results  from  SPL(M1;  N-t ,  M2,  N2)  were  not  as  good  as  those  obtained  from  DEL(M1;  M2), 
we  tried  one  more  method  in  the  inverse  problem  with  the  field  data.  Based  on  previous  work  in  [9]  and 
[10]  and  our  own  numerical  simulations  with  simulated  data,  we  know  that  a  Bi-Gaussian  growth  rate 
distribution  results  in  population  density  data  with  the  two  key  features  of  dispersion  and  bifurcation.  The 
field  data  that  we  are  using  in  these  computations  exhibit  these  features  as  well,  so  we  suspect  that  the  un¬ 
derlying  growth  rate  distribution  is  Bi-Gaussian.  With  that  in  mind,  instead  of  approximating  the  density 
with  delta  functions  or  piecewise  linear  splines,  we  chose  to  use  a  parametric  Bi-Gaussian  probability 
density  function  in  the  growth  rate  distribution  (GRD)  model.  The  estimated  {pjk}  and  {cijk}  found 
via  the  DEL(A/| ,  M>)  and  SPL(Mi,  N  \ .  M2 ,  N2)  methods,  respectively,  were  not  readily  interpreted  in 
terms  of  the  actual  mosquitofish  growth  rate  and  maximum  size  means  and  variances.  However,  by  us¬ 
ing  an  a  priori  Bi-Gaussian  probability  density  function  in  the  GRD  model,  we  have  in  essence  taken 
a  standard  parametric  approach  to  the  statistical  inverse  problem.  The  Bi-Gaussian  probability  density 
function  p  we  choose  is  given  by 


where  we  have  assumed  b  and  7  to  be  independent  Bi-Gaussian  random  variables.  The  parameters 
(61,  b2)  and  (<7^  ,  er^J  represent  the  means  and  variances,  respectively,  of  a  Bi-Gaussian  distribution  on 
the  intrinsic  rates  b ,  while  the  parameters  (71 , 72)  and  (cr?^  ,  a^2 )  represent  the  means  and  variances  of  a 


22 


Field  Data  -  Day  195 


Field  Data  versus  Estimated  Population  -  Day  202 


Figure  9:  Field  data  versus  estimated  population  for  SPL(5,35,9,35) 


Figure  10:  a)  Estimated  probability  density  and  b)  estimated  probability  distribution  for  SPL(5, 35,9,35) 


Bi-Gaussian  distribution  on  the  maximum  sizes  7.  We  will  define  q  =  (bi,  cr^  ,  62 ,  cr^ ,  71 ,  <7^ ,  72 ,  cr^2 )  . 
Our  third  approach  will  not  use  an  approximation  to  the  GRD  model,  but  instead  use  the  Bi-Gaussian 
probability  density  function,  given  above,  in  the  GRD  model 


u(t,x;  q)  =  /  /  v(t,x;g(b,i))p(b,'r,q)d'ydb, 
J b  Jr 
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where  again  g(x;  b,  7)  =  b( 7  —  a;).  We  will  denote  this  third  approach  as  PAR (M,  N-t ,  7V2),  where  M  is 
the  number  of  parameters  in  q  and  N\  and  N->  represent  the  number  of  quadratures  used  in  the  composite 
trapezoidal  rule  [16]  to  approximate  the  double  integral  with  respect  to  b  and  7,  respectively.  We  will 
estimate  q  by  solving  the  ordinary  least  squares  problem 

min  J(q)  =  V'  \u(U,  at,-;  q)  -  u(U,  Xj)\2, 

q£MM  z ' 

where  u(t,  x)  is  the  data.  Once  this  least  squares  problem  has  been  solved,  we  can  use  the  optimal  q  in 
the  Bi-Gaussian  probability  density  function  to  determine  the  population  density  u(t,  x:  q). 

The  optimal  results  for  the  inverse  problem  using  PAR(8,35,35)  are  shown  in  Figure  1 1 .  The  optimal 
parameters  q  =  (1.9749, 0.0388  x  10"3, 3.9132, 0.2228  x  10"3, 0.5265, 0.7208, 0.0122, 0.0372),  with 
a  residual  J  =  0.0074,  were  determined  in  20.5490  seconds.  In  comparison  to  the  results  obtained  using 
DEL(32,16)  and  SPL(5,35,9,35),  the  estimated  population  density  does  not  fit  the  field  data  as  well  as 
the  estimated  population  density  produced  by  DEL(32,16)  but  is  comparable  to  those  obtained  using 
SPL(5,35,9,35).  We  note  that  the  spline  based  approximation  method  does  a  better  job  of  estimating 
the  frequencies  for  the  smaller  size  classes  than  the  parameterized  OLS  technique.  While  the  results 
from  the  spline  based  approximation  method  and  the  parameterized  OLS  method  are  very  similar,  the 
computational  time  required  by  PAR(8,35,35)  is  much  lower  (compare  the  32  minutes  for  SPL(5,35,9,35) 
versus  21  seconds  for  PAR(8,35,35))  than  the  computational  time  required  by  SPL(5,35,9,35).  The 
estimated  probability  density  and  probability  distribution  generated  by  the  optimal  q  are  shown  in  Figure 
12.  We  clearly  see  for  a  fixed  value  of  7  the  probability  distribution  of  b  is  Bi-Gaussian. 

As  we  did  previously,  we  would  like  to  also  present  here  some  results  on  the  uncertainty  associated 
with  the  estimated  growth  distributions  determined  by  the  inverse  problem.  The  treatment  in  this  section 
with  the  field  data  is  very  similar  to  the  treatment  previously  carried  out  with  the  simulated  data.  With 
respect  to  the  optimal  results  given  in  this  section,  we  will  only  be  able  to  perform  a  statistical  analysis 
for  SPL(5,35,9,35)  and  PAR(8,35,35).  We  are  unable  to  perform  this  analysis  for  DEL(32,16)  because 
the  field  data  consists  of  60  data  points  and  the  number  of  parameters  determined  by  the  DEL(32,16)  is 
512;  thus,  the  analysis  in  this  case  is  invalid.  Since  we  have  explained  in  detail  the  underlying  statis¬ 
tical  model  that  we  are  considering,  we  will  omit  these  details  here  and  define  functions  and  variables 
that  are  used  with  respect  to  SPL(Mi,  Ni,  M2,  N2)  and  PAR(M,  Ni,  _ZV2).  First,  we  note  that  {xi}™=1 
corresponds  to  ( ti,xm ),  l  =  1, . . . ,  3,  m  =  1, . . . ,  20  pairs  since  the  field  data  that  we  use  in  the  inverse 
problem  consists  of  three  days  and  twenty  size  classes  (hence  n  =  60),  9  =  {ajk}fIk~il2  when  using 
SPL(Mi,  N\,M2,  N2),  and  9  =  q  when  using  PAR(M,  N\,  N2).  We  also  note  that 

M±  M2 

f  ( Xi ,  o)  =  y  '  y  ( a,jk 

j=i  *;= 1 

where  9  =  {djk}  when  considering  SPL(M1;  N-\ ,  M2,  _/V2).  When  considering  PAR(M,  Ni,N2), 


/  /  v(xi\g)lj(b)lk(^)d'ydb, 
Jb  Jr 


f(xi,9)=  /  /  v(xi]  g)p{b,  7;  q)d'ydb. 

Jb  Jr 


We  will  define  M  from  our  previous  analysis  to  be  M \  ■  M2  for  SPL(Mi ,  N\ ,  M2 ,  W2)  and  M  (the  num¬ 
ber  of  parameters  in  q)  for  PAR(Af,  N\,  iV2).  For  SPL(Mi,  N\,  M2,  A72),  the  entries  in  the  sensitivity 
matrix  X  (9)  are  given  by 


Xim{9)  = 


df(xj,9) 

d9m 


v(xi\ g)lj{b)lk(l)d^db. 


b  Jr 


For  the  parameterized  OLS  method,  the  entries  in  the  sensitivity  matrix  X(9)  are  given  by 


Xim{9)  = 


df(xj,9) 
09  rn 


b  Jr 


<-  N5p(6,7;6») 

v(Xi;  g)  — wr - d'jdb, 

OUm, 
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Field  Data  -  Day  195 


Field  Data  versus  Estimated  Population  -  Day  202 


Figure  11:  Field  data  versus  estimated  population  for  PAR(8, 35,35) 


Figure  12:  a)  Estimated  probability  density  and  b)  estimated  probability  distribution  for  PAR(8,35,35) 


and  we  are  able  to  analytically  compute  dpj^~f,9) ,  m  _  l5 . . . ;  M.  Using  these  facts,  we  are  able  to 
estimate  the  covariance  matrix  E,  which  we  use  to  determine  the  standard  errors  and  confidence  intervals 
for  the  6.  As  we  stated  before,  the  endpoints  of  the  confidence  intervals  are  given  by 

d±fW2SE(0), 


25 


where  fi_a /2  is  a  distribution  value  given  in  the  t-distribution  statistical  table  (determined  by  the  level 
of  significance  chosen  [24]).  As  before,  we  also  chose  to  use  a  significance  level  of  95%  corresponding 
to  a  =  0.05.  From  the  statistical  table  for  the  t-distribution,  a  =  0.05  corresponds  to  1.96  when  the 
number  of  degrees  of  freedom  is  large,  i.e.  n  >  30  which  is  true  in  this  case. 

For  SPL(5,35,9,35),  the  optimal  value  for  a-ji-  and  the  corresponding  confidence  interval  are  given 
in  Table  5.  As  clearly  seen  by  the  values  given  in  this  table,  the  confidence  intervals  are  very  large 
in  comparison  to  the  value  of  9.  Based  on  these  confidence  intervals,  we  are  not  very  confident  about 
estimated  growth  rate  distributions  obtained  using  SPL(5,35,9,35)  in  the  inverse  problem  with  this  field 
data.  Moreover,  we  found  that  as  the  quantity  Mi  x  M2  became  larger  (while  still  remaining  smaller  than 
n ),  X1(0)X(0)  became  nearly  singular,  resulting  in  a  very  ill-conditioned  covariance  matrix  S.  This,  in 
turn,  resulted  in  larger  confidence  intervals,  which  implied  that  the  estimated  growth  rate  distributions 
produced  by  the  spline  based  approximation  method  were  even  more  unreliable  with  respect  to  the  field 
data.  Furthermore,  we  were  unable  to  compute  confidence  intervals  for  the  (Mi, M2)  pairs  (5,11)  and 
(1 1,5)  because  the  covariance  matrix  £  did  not  exist  in  these  cases  (XT  (9)X(9)  was  singular  to  machine 
accuracy). 
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Table  5:  Estimated  values  ajk  and  confidence  intervals  for  SPL(5,35,9,35)  with  field  data 

In  Table  6,  the  optimal  parameter  values  for  q  with  the  corresponding  confidence  intervals  are  given 
for  PAR(8,35,35).  We  see  that  the  confidence  intervals  computed  with  respect  to  bi ,  ,  b2 ,  cr^ ,  i.e. 
those  associated  with  the  individual  growth  rate  b,  are  relatively  large  in  comparison  to  the  optimal 
value  obtained  from  this  method  for  these  parameters.  Therefore,  we  can  not  be  very  certain  when 
reporting  values  for  these  particular  parameters  when  using  PAR(8,35,35).  However,  we  see  that  the 
confidence  intervals  obtained  for  71 ,  ,  72 ,  tr^2 ,  he.,  those  associated  with  the  maximum  size  7,  are 

very  small  in  comparison  to  the  optimal  values  obtained  using  this  method.  Based  on  this  analysis, 
we  feel  more  certain  about  the  estimates  obtained  for  these  parameters  because  of  the  much  smaller 
confidence  intervals  associated  with  these  parameters. 

Overall,  we  found  that  the  estimated  growth  rate  distributions  obtained  using  DEL(32,16)  produced 
the  best  fit  to  the  field  data  in  comparison  to  both  SPL(5,35,9,35)  and  PAR(8,35,35).  We  also  found  the 
results  produced  by  PAR(8,35,35)  were  very  much  comparable  to  those  obtained  using  SPL(5,35,9,35) 
in  a  more  efficient  manner  in  terms  of  computational  time.  From  the  sensitivity  analysis  done  on  the 
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Parameter 

0±h-a/2  SE(9) 

bi 

1.9749  ±395.3664 

< 

0.0388  x  10~3  ±3.1450 

b2 

3.9132  ±  139.7600 

aL 

0.2228  x  10-3  ±  1.5226 

71 

0.5265  ±0.8471  x  10“4 

2 

< 

0.0122  ±0.2982  x  10“4 

72 

0.7208  ±0.1426  x  10“3 

2 

< 

0.0372  ±0.8201  x  10~4 

Table  6:  Estimated  parameters  for  Bi-Gaussian  p(b.  7)  and  confidence  intervals  for  PAR(8,35,35)  with  field 
data 


estimates  obtained  from  SPL(5,35,9,35)  and  PAR(8,35,35),  we  observed  that  for  this  inverse  problem 
the  estimates  from  the  spline  based  approximation  method  are  not  very  reliable  (based  on  the  resulting 
large  confidence  intervals).  We  are  more  certain  about  the  parameters  related  to  7  when  using  the  pa¬ 
rameterized  OLS  than  the  parameters  related  to  b.  Overall,  based  on  the  fit-to-data  and  computational 
time,  the  delta  function  approximation  provides  the  best  estimates  of  the  growth  rate  distribution  for  the 
field  data  in  this  example. 


5  Concluding  Remarks 

In  this  paper  we  have  presented  computational  and  statistical  comparisons  of  two  “finite-element”  type 
approximation  schemes  for  estimation  of  probability  distributions  arising  in  problems  with  aggregate  or 
population  level  observations.  One  approximation  scheme,  DEL,  is  at  the  level  of  the  distribution  being 
sought,  while  the  other  is  at  the  level  of  the  density  (tacitly  assumed  to  exist  in  the  underlying  theory 
[12])  associated  with  the  distribution.  Strengths  and  weaknesses  are  illustrated  in  several  computational 
examples  with  simulated  data  in  the  context  of  functional  growth  rate  estimation  in  size-structured  pop¬ 
ulation  models.  Finally,  the  most  recently  developed  scheme,  SPL,  is  used  with  experimental  data  for 
mosquitofish  populations  and  compared  to  earlier  findings  with  the  DEL  scheme.  While  the  results  here 
are  specific  to  size-structured  population  models,  the  ideas  are  widely  applicable  to  other  areas  of  appli¬ 
cations  in  biology  as  well  as  in  viscoelastic  materials  and  in  electromagnetic  interrogation  of  dielectric 
materials. 
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